lda model
Simplex Deep Linear Discriminant Analysis
Tezekbayev, Maxat, Bolatov, Arman, Assylbekov, Zhenisbek
We revisit Deep Linear Discriminant Analysis (Deep LDA) from a likelihood-based perspective. While classical LDA is a simple Gaussian model with linear decision boundaries, attaching an LDA head to a neural encoder raises the question of how to train the resulting deep classifier by maximum likelihood estimation (MLE). We first show that end-to-end MLE training of an unconstrained Deep LDA model ignores discrimination: when both the LDA parameters and the encoder parameters are learned jointly, the likelihood admits a degenerate solution in which some of the class clusters may heavily overlap or even collapse, and classification performance deteriorates. Batchwise moment re-estimation of the LDA parameters does not remove this failure mode. We then propose a constrained Deep LDA formulation that fixes the class means to the vertices of a regular simplex in the latent space and restricts the shared covariance to be spherical, leaving only the priors and a single variance parameter to be learned along with the encoder. Under these geometric constraints, MLE becomes stable and yields well-separated class clusters in the latent space. On images (Fashion-MNIST, CIFAR-10, CIFAR-100), the resulting Deep LDA models achieve accuracy competitive with softmax baselines while offering a simple, interpretable latent geometry that is clearly visible in two-dimensional projections.
- North America > United States > Maryland > Baltimore (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)
- North America > Puerto Rico > San Juan > San Juan (0.04)
- Asia > China > Anhui Province > Hefei (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.71)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.71)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Discriminant Analysis (0.62)
- Asia > Middle East > Jordan (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Europe > France > Auvergne-Rhône-Alpes > Isère > Grenoble (0.04)
- Asia > Middle East > Jordan (0.04)
- South America > Paraguay > Asunción > Asunción (0.04)
- North America > United States > Washington > King County > Seattle (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Automated Sentiment Classification and Topic Discovery in Large-Scale Social Media Streams
Lu, Yiwen, Xiong, Siheng, Li, Zhaowei
We present a framework for large-scale sentiment and topic analysis of Twitter discourse. Our pipeline begins with targeted data collection using conflict-specific keywords, followed by automated sentiment labeling via multiple pre-trained models to improve annotation robustness. We examine the relationship between sentiment and contextual features such as timestamp, geolocation, and lexical content. To identify latent themes, we apply Latent Dirichlet Allocation (LDA) on partitioned subsets grouped by sentiment and metadata attributes. Finally, we develop an interactive visualization interface to support exploration of sentiment trends and topic distributions across time and regions. This work contributes a scalable methodology for social media analysis in dynamic geopolitical contexts.
Dynamic Topic Analysis in Academic Journals using Convex Non-negative Matrix Factorization Method
Yang, Yang, Zhang, Tong, Wu, Jian, Su, Lijie
With the rapid advancement of large language models, academic topic identification and topic evolution analysis are crucial for enhancing AI's understanding capabilities. Dynamic topic analysis provides a powerful approach to capturing and understanding the temporal evolution of topics in large-scale datasets. This paper presents a two-stage dynamic topic analysis framework that incorporates convex optimization to improve topic consistency, sparsity, and interpretability. In Stage 1, a two-layer non-negative matrix factorization (NMF) model is employed to extract annual topics and identify key terms. In Stage 2, a convex optimization algorithm refines the dynamic topic structure using the convex NMF (cNMF) model, further enhancing topic integration and stability. Applying the proposed method to IEEE journal abstracts from 2004 to 2022 effectively identifies and quantifies emerging research topics, such as COVID-19 and digital twins. By optimizing sparsity differences in the clustering feature space between traditional and emerging research topics, the framework provides deeper insights into topic evolution and ranking analysis. Moreover, the NMF-cNMF model demonstrates superior stability in topic consistency. At sparsity levels of 0.4, 0.6, and 0.9, the proposed approach improves topic ranking stability by 24.51%, 56.60%, and 36.93%, respectively. The source code (to be open after publication) is available at https://github.com/meetyangyang/CDNMF.
- Asia > China > Liaoning Province > Shenyang (0.05)
- Asia > China > Beijing > Beijing (0.05)
- North America > United States (0.04)
- (2 more...)
Topic mining based on fine-tuning Sentence-BERT and LDA
Research background: With the continuous development of society, consumers pay more attention to the key information of product fine-grained attributes when shopping. Research purposes: This study will fine tune the Sentence-BERT word embedding model and LDA model, mine the subject characteristics in online reviews of goods, and show consumers the details of various aspects of goods. Research methods: First, the Sentence-BERT model was fine tuned in the field of e-commerce online reviews, and the online review text was converted into a word vector set with richer semantic information; Secondly, the vectorized word set is input into the LDA model for topic feature extraction; Finally, focus on the key functions of the product through keyword analysis under the theme. Results: This study compared this model with other word embedding models and LDA models, and compared it with common topic extraction methods. The theme consistency of this model is 0.5 higher than that of other models, which improves the accuracy of theme extraction
- Asia > Mongolia (0.04)
- Asia > Middle East > Jordan (0.04)
- Asia > China > Inner Mongolia > Hohhot (0.04)
- Asia > China > Hebei Province (0.04)
- Energy (0.47)
- Information Technology (0.35)
Analysis of Variational Bayesian Latent Dirichlet Allocation: Weaker Sparsity Than MAP
Shinichi Nakajima, Issei None Sato, Masashi Sugiyama, Kazuho Watanabe, Hiroko Kobayashi
Latent Dirichlet allocation (LDA) is a popular generative model of various objects such as texts and images, where an object is expressed as a mixture of latent topics. In this paper, we theoretically investigate variational Bayesian (VB) learning in LDA. More specifically, we analytically derive the leading term of the VB free energy under an asymptotic setup, and show that there exist transition thresholds in Dirichlet hyperparameters around which the sparsity-inducing behavior drastically changes. Then we further theoretically reveal the notable phenomenon that VB tends to induce weaker sparsity than MAP in the LDA model, which is opposed to other models. We experimentally demonstrate the practical validity of our asymptotic theory on real-world Last.FM music data.
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.05)
- South America > Paraguay > Asunción > Asunción (0.04)
- North America > United States (0.04)
- (3 more...)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.96)
- Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.72)
- Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.72)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.69)
Export Reviews, Discussions, Author Feedback and Meta-Reviews
We thank the reviewers for their time and their valuable feedback. For the final version, we will improve the presentation as suggested. We address below some of their additional concerns. R2: [contributions] Indeed, the method of moments is not new and has already been applied to ICA and LDA (see refs in the paper). However, given the popularity of LDA in machine learning, we believe that our combination of several threads in this setting makes valuable contributions.
Topic Modeling in Marathi
Shinde, Sanket, Joshi, Raviraj
While topic modeling in English has become a prevalent and well-explored area, venturing into topic modeling for Indic languages remains relatively rare. The limited availability of resources, diverse linguistic structures, and unique challenges posed by Indic languages contribute to the scarcity of research and applications in this domain. Despite the growing interest in natural language processing and machine learning, there exists a noticeable gap in the comprehensive exploration of topic modeling methodologies tailored specifically for languages such as Hindi, Marathi, Tamil, and others. In this paper, we examine several topic modeling approaches applied to the Marathi language. Specifically, we compare various BERT and non-BERT approaches, including multilingual and monolingual BERT models, using topic coherence and topic diversity as evaluation metrics. Our analysis provides insights into the performance of these approaches for Marathi language topic modeling. The key finding of the paper is that BERTopic, when combined with BERT models trained on Indic languages, outperforms LDA in terms of topic modeling performance.
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- Asia > India > Tamil Nadu > Chennai (0.04)
- Asia > India > Maharashtra (0.04)
- Africa (0.04)
Reliability of Topic Modeling
Schroeder, Kayla, Wood-Doughty, Zach
Topic models allow researchers to extract latent factors from text data and use those variables in downstream statistical analyses. However, these methodologies can vary significantly due to initialization differences, randomness in sampling procedures, or noisy data. Reliability of these methods is of particular concern as many researchers treat learned topic models as ground truth for subsequent analyses. In this work, we show that the standard practice for quantifying topic model reliability fails to capture essential aspects of the variation in two widely-used topic models. Drawing from a extensive literature on measurement theory, we provide empirical and theoretical analyses of three other metrics for evaluating the reliability of topic models. On synthetic and real-world data, we show that McDonald's $\omega$ provides the best encapsulation of reliability. This metric provides an essential tool for validation of topic model methodologies that should be a standard component of any topic model-based research.
- Asia > Middle East > Jordan (0.04)
- North America > United States > Ohio > Franklin County > Columbus (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)